AITopics

2605.2505

Country: Europe > France (0.69)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Aich, Agnideep, Murshed, Md Monzur, Mayeaux, Amanda, Hewage, Sameera

Can Copulas Be Used for Feature Selection? A Machine Learning Study on Diabetes Risk Prediction

arXiv.org Machine LearningMay-29-2025

Accurate diabetes risk prediction relies on identifying key features from complex health datasets, but conventional methods like mutual information (MI) filters and genetic algorithms (GAs) often overlook extreme dependencies critical for high-risk subpopulations. In this study we introduce a feature-selection framework using the upper-tail dependence coefficient (λU) of the novel A2 copula, which quantifies how often extreme higher values of a predictor co-occur with diabetes diagnoses (target variable). Applied to the CDC Diabetes Health Indicators dataset (n=253,680), our method prioritizes five predictors (self-reported general health, high blood pressure, body mass index, mobility limitations, and high cholesterol levels) based on upper tail dependencies. These features match or outperform MI and GA selected subsets across four classifiers (Random Forest, XGBoost, Logistic Regression, Gradient Boosting), achieving accuracy up to 86.5% (XGBoost) and AUC up to 0.806 (Gradient Boosting), rivaling the full 21-feature model. Permutation importance confirms clinical relevance, with BMI and general health driving accuracy. To our knowledge, this is the first work to apply a copula's upper-tail dependence for supervised feature selection, bridging extreme-value theory and machine learning to deliver a practical toolkit for diabetes prevention.

artificial intelligence, copula, machine learning, (11 more...)

2505.22554

Country:

North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
North America > United States > West Virginia (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

arXiv.org Artificial IntelligenceDec-24-2024

Unveiling Secrets of Brain Function With Generative Modeling: Motion Perception in Primates & Cortical Network Organization in Mice

Vafaii, Hadi

This Dissertation is comprised of two main projects, addressing questions in neuroscience through applications of generative modeling. Project #1 (Chapter 4) explores how neurons encode features of the external world. I combine Helmholtz's "Perception as Unconscious Inference" -- paralleled by modern generative models like variational autoencoders (VAE) -- with the hierarchical structure of the visual cortex. This combination leads to the development of a hierarchical VAE model, which I test for its ability to mimic neurons from the primate visual cortex in response to motion stimuli. Results show that the hierarchical VAE perceives motion similar to the primate brain. Additionally, the model identifies causal factors of retinal motion inputs, such as object- and self-motion, in a completely unsupervised manner. Collectively, these results suggest that hierarchical inference underlines the brain's understanding of the world, and hierarchical VAEs can effectively model this understanding. Project #2 (Chapter 5) investigates the spatiotemporal structure of spontaneous brain activity and its reflection of brain states like rest. Using simultaneous fMRI and wide-field Ca2+ imaging data, this project demonstrates that the mouse cortex can be decomposed into overlapping communities, with around half of the cortical regions belonging to multiple communities. Comparisons reveal similarities and differences between networks inferred from fMRI and Ca2+ signals. The introduction (Chapter 1) is divided similarly to this abstract: sections 1.1 to 1.8 provide background information about Project #1, and sections 1.9 to 1.13 are related to Project #2. Chapter 2 includes historical background, Chapter 3 provides the necessary mathematical background, and finally, Chapter 6 contains concluding remarks and future directions.

data mining, machine learning, natural language, (23 more...)

2412.19845

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(13 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(7 more...)

Gross, Dennis, Spieker, Helge, Gotlieb, Arnaud, Knoblauch, Ricardo, Elmansori, Mohamed

Efficient Milling Quality Prediction with Explainable Machine Learning

arXiv.org Artificial IntelligenceSep-16-2024

This paper presents an explainable machine learning (ML) approach for predicting surface roughness in milling. Utilizing a dataset from milling aluminum alloy 2017A, the study employs random forest regression models and feature importance techniques. The key contributions include developing ML models that accurately predict various roughness values and identifying redundant sensors, particularly those for measuring normal cutting force. Our experiments show that removing certain sensors can reduce costs without sacrificing predictive accuracy, highlighting the potential of explainable machine learning to improve cost-effectiveness in machining.

dataset, ml model, prediction, (12 more...)

2409.10203

Country:

North America > United States (0.04)
Europe > United Kingdom > Wales > Cardiff (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Lu, Min, Ishwaran, Hemant

Model-independent variable selection via the rule-based variable priority

arXiv.org Machine LearningSep-16-2024

While achieving high prediction accuracy is a fundamental goal in machine learning, an equally important task is finding a small number of features with high explanatory power. One popular selection technique is permutation importance, which assesses a variable's impact by measuring the change in prediction error after permuting the variable. However, this can be problematic due to the need to create artificial data, a problem shared by other methods as well. Another problem is that variable selection methods can be limited by being model-specific. We introduce a new model-independent approach, Variable Priority (VarPro), which works by utilizing rules without the need to generate artificial data or evaluate prediction error. The method is relatively easy to use, requiring only the calculation of sample averages of simple statistics, and can be applied to many data settings, including regression, classification, and survival. We investigate the asymptotic properties of VarPro and show, among other things, that VarPro has a consistent filtering property for noise variables. Empirical studies using synthetic and real-world data show the method achieves a balanced performance and compares favorably to many state-of-the-art procedures currently used for variable selection.

noise variable, signal variable, varpro, (17 more...)

2409.09003

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.65)

Maturo, Fabrizio, Porreca, Annamaria

Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations

arXiv.org Machine LearningAug-23-2024

This paper introduces a novel supervised classification strategy that integrates functional data analysis (FDA) with tree-based methods, addressing the challenges of high-dimensional data and enhancing the classification performance of existing functional classifiers. Specifically, we propose augmented versions of functional classification trees and functional random forests, incorporating a new tool for assessing the importance of functional principal components. This tool provides an ad-hoc method for determining unbiased permutation feature importance in functional data, particularly when dealing with correlated features derived from successive derivatives. Our study demonstrates that these additional features can significantly enhance the predictive power of functional classifiers. Experimental evaluations on both real-world and simulated datasets showcase the effectiveness of the proposed methodology, yielding promising results compared to existing methods.

afct, derivative, functional data, (14 more...)

2408.13179

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

arXiv.org Artificial IntelligenceJul-19-2024

DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks

Jabbour, Sarah, Kondas, Gregory, Kazerooni, Ella, Sjoding, Michael, Fouhey, David, Wiens, Jenna

We propose a permutation-based explanation method for image classifiers. Current image-model explanations like activation maps are limited to instance-based explanations in the pixel space, making it difficult to understand global model behavior. In contrast, permutation based explanations for tabular data classifiers measure feature importance by comparing model performance on data before and after permuting a feature. We propose an explanation method for image-based models that permutes interpretable concepts across dataset images. Given a dataset of images labeled with specific concepts like captions, we permute a concept across examples in the text space and then generate images via a text-conditioned diffusion model. Feature importance is then reflected by the change in model performance relative to unpermuted data. When applied to a set of concepts, the method generates a ranking of feature importance. We show this approach recovers underlying model feature importance on synthetic and real-world image classification tasks.

classifier, concept classifier, feature model, (17 more...)

2407.14509

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Nuclear Medicine (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.70)
(2 more...)

Power, Luke, Guha, Krishnendu

Feature Importance and Explainability in Quantum Machine Learning

arXiv.org Machine LearningMay-14-2024

Many Machine Learning (ML) models are referred to as black box models, providing no real insights into why a prediction is made. Feature importance and explainability are important for increasing transparency and trust in ML models, particularly in settings such as healthcare and finance. With quantum computing's unique capabilities, such as leveraging quantum mechanical phenomena like superposition, which can be combined with ML techniques to create the field of Quantum Machine Learning (QML), and such techniques may be applied to QML models. This article explores feature importance and explainability insights in QML compared to Classical ML models. Utilizing the widely recognized Iris dataset, classical ML algorithms such as SVM and Random Forests, are compared against hybrid quantum counterparts, implemented via IBM's Qiskit platform: the Variational Quantum Classifier (VQC) and Quantum Support Vector Classifier (QSVC). This article aims to provide a comparison of the insights generated in ML by employing permutation and leave one out feature importance methods, alongside ALE (Accumulated Local Effects) and SHAP (SHapley Additive exPlanations) explainers.

feature importance, petal length, prediction, (14 more...)

2405.08917

Country:

North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Ireland > Munster > County Cork > Cork (0.04)
Europe > Austria > Vienna (0.04)
Africa > Mozambique > Gaza Province > Xai-Xai (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.35)

arXiv.org Artificial IntelligenceApr-30-2023

Predictability of Machine Learning Algorithms and Related Feature Extraction Techniques

Dong, Yunbo

To implement machine learning, it is essential to first determine an appropriate algorithm for the dataset. Different algorithms may produce a large number of different models with different hyperparameter configurations, and it usually takes a lot of time to run the model on a large dataset when the model is relatively complex. Therefore, how to predict the performance of a model on a dataset is an fundamental problem to be solved. This thesis designs a prediction system based on matrix factorization to predict the classification accuracy of a specific model on a particular dataset. In this thesis, we conduct a comprehensive empirical research on more than fifty datasets that we collected from the openml web site.

artificial intelligence, dataset, machine learning algorithm, (11 more...)

2305.00449

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)
(2 more...)